Bernoulli Process
   HOME

TheInfoList



OR:

In
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...
and statistics, a Bernoulli process (named after
Jacob Bernoulli Jacob Bernoulli (also known as James or Jacques; – 16 August 1705) was one of the many prominent mathematicians in the Bernoulli family. He was an early proponent of Leibnizian calculus and sided with Gottfried Wilhelm Leibniz during the Le ...
) is a finite or infinite sequence of binary random variables, so it is a
discrete-time stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
that takes only two values, canonically 0 and 1. The component Bernoulli variables ''X''''i'' are identically distributed and independent. Prosaically, a Bernoulli process is a repeated
coin flipping Coin flipping, coin tossing, or heads or tails is the practice of throwing a coin in the air and checking which side is showing when it lands, in order to choose between two alternatives, heads or tails, sometimes used to resolve a dispute betwe ...
, possibly with an unfair coin (but with consistent unfairness). Every variable ''X''''i'' in the sequence is associated with a
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
or experiment. They all have the same
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabi ...
. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes (such as the process for a six-sided die); this generalization is known as the
Bernoulli scheme In mathematics, the Bernoulli scheme or Bernoulli shift is a generalization of the Bernoulli process to more than two possible outcomes. Bernoulli schemes appear naturally in symbolic dynamics, and are thus important in the study of dynamical sy ...
. The problem of determining the process, given only a limited sample of Bernoulli trials, may be called the problem of
checking whether a coin is fair In statistics, the question of checking whether a coin is fair is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of statistical inference and, secondly, in providing a simple problem that can be ...
.


Definition

A Bernoulli process is a finite or infinite sequence of
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
random variables ''X''1, ''X''2, ''X''3, ..., such that * for each ''i'', the value of ''X''''i'' is either 0 or 1; * for all values of ''i'', the probability ''p'' that ''X''''i'' = 1 is the same. In other words, a Bernoulli process is a sequence of
independent identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s. Independence of the trials implies that the process is
memoryless In probability and statistics, memorylessness is a property of certain probability distributions. It usually refers to the cases when the distribution of a "waiting time" until a certain event does not depend on how much time has elapsed already ...
. Given that the probability ''p'' is known, past outcomes provide no information about future outcomes. (If ''p'' is unknown, however, the past informs about the future indirectly, through inferences about ''p''.) If the process is infinite, then from any point the future trials constitute a Bernoulli process identical to the whole process, the fresh-start property.


Interpretation

The two possible values of each ''X''''i'' are often called "success" and "failure". Thus, when expressed as a number 0 or 1, the outcome may be called the number of successes on the ''i''th "trial". Two other common interpretations of the values are true or false and yes or no. Under any interpretation of the two values, the individual variables ''X''''i'' may be called
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s with parameter p. In many applications time passes between trials, as the index i increases. In effect, the trials ''X''1, ''X''2, ... ''X''i, ... happen at "points in time" 1, 2, ..., ''i'', .... That passage of time and the associated notions of "past" and "future" are not necessary, however. Most generally, any ''X''i and ''X''''j'' in the process are simply two from a set of random variables indexed by , the finite cases, or by , the infinite cases. One experiment with only two possible outcomes, often referred to as "success" and "failure", usually encoded as 1 and 0, can be modeled as a
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabi ...
. Several random variables and probability distributions beside the Bernoullis may be derived from the Bernoulli process: *The number of successes in the first ''n'' trials, which has a binomial distribution B(''n'', ''p'') *The number of failures needed to get ''r'' successes, which has a negative binomial distribution NB(''r'', ''p'') *The number of failures needed to get one success, which has a
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; * ...
NB(1, ''p''), a special case of the negative binomial distribution The negative binomial variables may be interpreted as random waiting times.


Formal definition

The Bernoulli process can be formalized in the language of
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
s as a random sequence of independent realisations of a random variable that can take values of heads or tails. The state space for an individual value is denoted by 2=\ .


Borel algebra

Consider the countably infinite direct product of copies of 2=\. It is common to examine either the one-sided set \Omega=2^\mathbb=\^\mathbb or the two-sided set \Omega=2^\mathbb. There is a natural
topology In mathematics, topology (from the Greek words , and ) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing ...
on this space, called the
product topology In topology and related areas of mathematics, a product space is the Cartesian product of a family of topological spaces equipped with a natural topology called the product topology. This topology differs from another, perhaps more natural-s ...
. The sets in this topology are finite sequences of coin flips, that is, finite-length
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
of ''H'' and ''T'' (''H'' stands for heads and ''T'' stands for tails), with the rest of (infinitely long) sequence taken as "don't care". These sets of finite sequences are referred to as cylinder sets in the product topology. The set of all such strings form a sigma algebra, specifically, a
Borel algebra In mathematics, a Borel set is any set in a topological space that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets are nam ...
. This algebra is then commonly written as (\Omega, \mathcal) where the elements of \mathcal are the finite-length sequences of coin flips (the cylinder sets).


Bernoulli measure

If the chances of flipping heads or tails are given by the probabilities \, then one can define a natural
measure Measure may refer to: * Measurement, the assignment of a number to a characteristic of an object or event Law * Ballot measure, proposed legislation in the United States * Church of England Measure, legislation of the Church of England * Mea ...
on the product space, given by P=\^\mathbb (or by P=\^\mathbb for the two-sided process). In another word, if a
discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X'' has a ''Bernoulli distribution'' with parameter ''p'', where 0 ≤ ''p'' ≤ 1, and its probability mass function is given by :pX(1)=P(X=1)=p and pX(0)=P(X=0)=1-p. We denote this distribution by Ber(''p''). Given a cylinder set, that is, a specific sequence of coin flip results omega_1, \omega_2,\cdots\omega_n/math> at times 1,2,\cdots,n, the probability of observing this particular sequence is given by :P( omega_1, \omega_2,\cdots ,\omega_n= p^k (1-p)^ where ''k'' is the number of times that ''H'' appears in the sequence, and ''n''−''k'' is the number of times that ''T'' appears in the sequence. There are several different kinds of notations for the above; a common one is to write :P(X_1=x_1, X_2=x_2,\cdots, X_n=x_n)= p^k (1-p)^ where each X_i is a binary-valued random variable with x_i= omega_i=H/math> in Iverson bracket notation, meaning either 1 if \omega_i=H or 0 if \omega_i=T. This probability P is commonly called the Bernoulli measure. Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this is because \lim_p^n=0, for any 0\le p<1. A probability equal to 1 implies that any given infinite sequence has
measure zero In mathematical analysis, a null set N \subset \mathbb is a measurable set that has measure zero. This can be characterized as a set that can be covered by a countable union of intervals of arbitrarily small total length. The notion of null ...
. Nevertheless, one can still say that some classes of infinite sequences of coin flips are far more likely than others, this is given by the
asymptotic equipartition property In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression. Roughly speaking, the th ...
. To conclude the formal definition, a Bernoulli process is then given by the probability triple (\Omega, \mathcal, P), as defined above.


Law of large numbers, binomial distribution and central limit theorem

Let us assume the canonical process with H represented by 1 and T represented by 0 . The law of large numbers states that the average of the sequence, i.e., \bar_:=\frac\sum_^X_ , will approach the expected value almost certainly, that is, the events which do not satisfy this limit have zero probability. The
expectation value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of flipping ''heads'', assumed to be represented by 1, is given by p. In fact, one has :\mathbb _i\mathbb( _i=1=p, for any given random variable X_i out of the infinite sequence of
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s that compose the Bernoulli process. One is often interested in knowing how often one will observe ''H'' in a sequence of ''n'' coin flips. This is given by simply counting: Given ''n'' successive coin flips, that is, given the set of all possible
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
of length ''n'', the number ''N''(''k'',''n'') of such strings that contain ''k'' occurrences of ''H'' is given by the binomial coefficient :N(k,n) = =\frac If the probability of flipping heads is given by ''p'', then the total probability of seeing a string of length ''n'' with ''k'' heads is :\mathbb( _n=k = p^k (1-p)^ , where S_n=\sum_^X_i . The probability measure thus defined is known as the Binomial distribution. As we can see from the above formula that, if n=1, the ''Binomial distribution'' will turn into a ''Bernoulli distribution''. So we can know that the ''Bernoulli distribution'' is exactly a special case of ''Binomial distribution'' when n equals to 1. Of particular interest is the question of the value of S_ for a sufficiently long sequences of coin flips, that is, for the limit n\to\infty. In this case, one may make use of
Stirling's approximation In mathematics, Stirling's approximation (or Stirling's formula) is an approximation for factorials. It is a good approximation, leading to accurate results even for small values of n. It is named after James Stirling, though a related but less p ...
to the factorial, and write :n! = \sqrt \;n^n e^ \left(1 + \mathcal\left(\frac\right)\right) Inserting this into the expression for ''P''(''k'',''n''), one obtains the
Normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
; this is the content of the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...
, and this is the simplest example thereof. The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the
asymptotic equipartition property In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression. Roughly speaking, the th ...
. Put informally, one notes that, yes, over many coin flips, one will observe ''H'' exactly ''p'' fraction of the time, and that this corresponds exactly with the peak of the Gaussian. The asymptotic equipartition property essentially states that this peak is infinitely sharp, with infinite fall-off on either side. That is, given the set of all possible infinitely long strings of ''H'' and ''T'' occurring in the Bernoulli process, this set is partitioned into two: those strings that occur with probability 1, and those that occur with probability 0. This partitioning is known as the Kolmogorov 0-1 law. The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
of the Bernoulli process. Once again, consider the set of all strings of length ''n''. The size of this set is 2^n. Of these, only a certain subset are likely; the size of this set is 2^ for H\le 1. By using Stirling's approximation, putting it into the expression for ''P''(''k'',''n''), solving for the location and width of the peak, and finally taking n\to\infty one finds that :H=-p\log_2 p - (1-p)\log_2(1-p) This value is the Bernoulli entropy of a Bernoulli process. Here, ''H'' stands for entropy; do not confuse it with the same symbol ''H'' standing for ''heads''.
John von Neumann John von Neumann (; hu, Neumann János Lajos, ; December 28, 1903 – February 8, 1957) was a Hungarian-American mathematician, physicist, computer scientist, engineer and polymath. He was regarded as having perhaps the widest cove ...
posed a curious question about the Bernoulli process: is it ever possible that a given process is isomorphic to another, in the sense of the
isomorphism of dynamical systems In mathematics, a measure-preserving dynamical system is an object of study in the abstract formulation of dynamical systems, and ergodic theory in particular. Measure-preserving systems obey the Poincaré recurrence theorem, and are a special ...
? The question long defied analysis, but was finally and completely answered with the
Ornstein isomorphism theorem In mathematics, the Ornstein isomorphism theorem is a deep result in ergodic theory. It states that if two Bernoulli schemes have the same Kolmogorov entropy, then they are isomorphic. The result, given by Donald Ornstein in 1970, is important b ...
. This breakthrough resulted in the understanding that the Bernoulli process is unique and
universal Universal is the adjective for universe. Universal may also refer to: Companies * NBCUniversal, a media and entertainment company ** Universal Animation Studios, an American Animation studio, and a subsidiary of NBCUniversal ** Universal TV, a ...
; in a certain sense, it is the single most random process possible; nothing is 'more' random than the Bernoulli process (although one must be careful with this informal statement; certainly, systems that are mixing are, in a certain sense, 'stronger' than the Bernoulli process, which is merely ergodic but not mixing. However, such processes do not consist of independent random variables: indeed, many purely deterministic, non-random systems can be mixing).


Dynamical systems

The Bernoulli process can also be understood to be a
dynamical system In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in ...
, as an example of an
ergodic system Ergodic theory ( Greek: ' "work", ' "way") is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, statistical properties means properties which are expre ...
and specifically, a
measure-preserving dynamical system In mathematics, a measure-preserving dynamical system is an object of study in the abstract formulation of dynamical systems, and ergodic theory in particular. Measure-preserving systems obey the Poincaré recurrence theorem, and are a special cas ...
, in one of several different ways. One way is as a
shift space In symbolic dynamics and related branches of mathematics, a shift space or subshift is a set of infinite words that represent the evolution of a discrete system. In fact, shift spaces and '' symbolic dynamical systems'' are often considered synon ...
, and the other is as an odometer. These are reviewed below.


Bernoulli shift

One way to create a dynamical system out of the Bernoulli process is as a
shift space In symbolic dynamics and related branches of mathematics, a shift space or subshift is a set of infinite words that represent the evolution of a discrete system. In fact, shift spaces and '' symbolic dynamical systems'' are often considered synon ...
. There is a natural translation symmetry on the product space \Omega=2^\mathbb given by the
shift operator In mathematics, and in particular functional analysis, the shift operator also known as translation operator is an operator that takes a function to its translation . In time series analysis, the shift operator is called the lag operator. Shift ...
:T(X_0, X_1, X_2, \cdots) = (X_1, X_2, \cdots) The Bernoulli measure, defined above, is translation-invariant; that is, given any cylinder set \sigma\in\mathcal, one has :P(T^(\sigma))=P(\sigma) and thus the Bernoulli measure is a Haar measure; it is an
invariant measure In mathematics, an invariant measure is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is invariant under squeeze mapping, ...
on the product space. Instead of the probability measure P:\mathcal\to\mathbb, consider instead some arbitrary function f:\mathcal\to\mathbb. The
pushforward The notion of pushforward in mathematics is "dual" to the notion of pullback, and can mean a number of different but closely related things. * Pushforward (differential), the differential of a smooth map between manifolds, and the "pushforward" op ...
:f\circ T^ defined by \left(f\circ T^\right)(\sigma) = f(T^(\sigma)) is again some function \mathcal\to\mathbb. Thus, the map T induces another map \mathcal_T on the space of all functions \mathcal\to\mathbb. That is, given some f:\mathcal\to\mathbb, one defines :\mathcal_T f = f \circ T^ The map \mathcal_T is a linear operator, as (obviously) one has \mathcal_T(f+g)= \mathcal_T(f) + \mathcal_T(g) and \mathcal_T(af)= a\mathcal_T(f) for functions f,g and constant a. This linear operator is called the
transfer operator Transfer may refer to: Arts and media * ''Transfer'' (2010 film), a German science-fiction movie directed by Damir Lukacevic and starring Zana Marjanović * ''Transfer'' (1966 film), a short film * ''Transfer'' (journal), in management studies ...
or the ''Ruelle–Frobenius–Perron operator''. This operator has a
spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors ...
, that is, a collection of
eigenfunction In mathematics, an eigenfunction of a linear operator ''D'' defined on some function space is any non-zero function f in that space that, when acted upon by ''D'', is only multiplied by some scaling factor called an eigenvalue. As an equation, th ...
s and corresponding eigenvalues. The largest eigenvalue is the Frobenius–Perron eigenvalue, and in this case, it is 1. The associated eigenvector is the invariant measure: in this case, it is the Bernoulli measure. That is, \mathcal_T(P)= P. If one restricts \mathcal_T to act on polynomials, then the eigenfunctions are (curiously) the
Bernoulli polynomial In mathematics, the Bernoulli polynomials, named after Jacob Bernoulli, combine the Bernoulli numbers and binomial coefficients. They are used for series expansion of functions, and with the Euler–MacLaurin formula. These polynomials occur ...
s! This coincidence of naming was presumably not known to Bernoulli.


The 2x mod 1 map

The above can be made more precise. Given an infinite string of binary digits b_0, b_1, \cdots write :y=\sum_^\infty \frac. The resulting y is a real number in the unit interval 0\le y\le 1. The shift T induces a
homomorphism In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same" ...
, also called T, on the unit interval. Since T(b_0, b_1, b_2, \cdots) = (b_1, b_2, \cdots), one can easily see that T(y)=2y\bmod 1. This map is called the
dyadic transformation The dyadic transformation (also known as the dyadic map, bit shift map, 2''x'' mod 1 map, Bernoulli map, doubling map or sawtooth map) is the mapping (i.e., recurrence relation) : T: , 1) \to , 1)^\infty : x \mapsto (x_0, x_1, x_2, ...
; for the doubly-infinite sequence of bits \Omega=2^\mathbb, the induced homomorphism is the Baker's map. Consider now the space of functions in y. Given some f(y) one can easily find that :\left[\mathcal_T f\right](y) = \fracf\left(\frac\right)+\fracf\left(\frac\right) Restricting the action of the operator \mathcal_T to functions that are on polynomials, one finds that it has a discrete spectrum given by :\mathcal_T B_n= 2^B_n where the B_n are the Bernoulli polynomials. Indeed, the Bernoulli polynomials obey the identity :\fracB_n\left(\frac\right)+\fracB_n\left(\frac\right) = 2^B_n(y)


The Cantor set

Note that the sum :y=\sum_^\infty \frac gives the
Cantor function In mathematics, the Cantor function is an example of a function that is continuous, but not absolutely continuous. It is a notorious counterexample in analysis, because it challenges naive intuitions about continuity, derivative, and measure. ...
, as conventionally defined. This is one reason why the set \^\mathbb is sometimes called the Cantor set.


Odometer

Another way to create a dynamical system is to define an odometer. Informally, this is exactly what it sounds like: just "add one" to the first position, and let the odometer "roll over" by using
carry bit In computer processors the carry flag (usually indicated as the C flag) is a single bit in a system status register/flag register used to indicate when an arithmetic carry or borrow has been generated out of the most significant arithmetic logic ...
s as the odometer rolls over. This is nothing more than base-two addition on the set of infinite strings. Since addition forms a
group (mathematics) In mathematics, a group is a set and an operation that combines any two elements of the set to produce a third element of the set, in such a way that the operation is associative, an identity element exists and every element has an inverse. ...
, and the Bernoulli process was already given a topology, above, this provides a simple example of a
topological group In mathematics, topological groups are logically the combination of groups and topological spaces, i.e. they are groups and topological spaces at the same time, such that the continuity condition for the group operations connects these two st ...
. In this case, the transformation T is given by :T\left(1,\dots,1,0,X_,X_,\dots\right) = \left(0,\dots,0,1,X_,X_,\dots \right). It leaves the Bernoulli measure invariant only for the special case of p=1/2 (the "fair coin"); otherwise not. Thus, T is a measure preserving dynamical system in this case, otherwise, it is merely a
conservative system In mathematics, a conservative system is a dynamical system which stands in contrast to a dissipative system. Roughly speaking, such systems have no friction or other mechanism to dissipate the dynamics, and thus, their phase space does not shrink o ...
.


Bernoulli sequence

The term Bernoulli sequence is often used informally to refer to a realization of a Bernoulli process. However, the term has an entirely different formal definition as given below. Suppose a Bernoulli process formally defined as a single random variable (see preceding section). For every infinite sequence ''x'' of coin flips, there is a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of integers :\mathbb^x = \ \, called the Bernoulli sequence associated with the Bernoulli process. For example, if ''x'' represents a sequence of coin flips, then the associated Bernoulli sequence is the list of natural numbers or time-points for which the coin toss outcome is ''heads''. So defined, a Bernoulli sequence \mathbb^x is also a random subset of the index set, the natural numbers \mathbb. Almost all Bernoulli sequences \mathbb^x are ergodic sequences.


Randomness extraction

From any Bernoulli process one may derive a Bernoulli process with ''p'' = 1/2 by the von Neumann extractor, the earliest
randomness extractor A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weakly random entropy source, together with a short, uniformly random seed, generates a highly random output that appears independent fro ...
, which actually extracts uniform randomness.


Basic von Neumann extractor

Represent the observed process as a sequence of zeroes and ones, or bits, and group that input stream in non-overlapping pairs of successive bits, such as (11)(00)(10)... . Then for each pair, * if the bits are equal, discard; * if the bits are not equal, output the first bit. This table summarizes the computation. For example, an input stream of eight bits 10011011 would by grouped into pairs as (10)(01)(10)(11). Then, according to the table above, these pairs are translated into the output of the procedure: (1)(0)(1)() (=101). In the output stream 0 and 1 are equally likely, as 10 and 01 are equally likely in the original, both having probability ''p''(1−''p'') = (1−''p'')''p''. This extraction of uniform randomness does not require the input trials to be independent, only
uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...
. More generally, it works for any exchangeable sequence of bits: all sequences that are finite rearrangements are equally likely. The von Neumann extractor uses two input bits to produce either zero or one output bits, so the output is shorter than the input by a factor of at least 2. On average the computation discards proportion ''p''2 + (1 − ''p'')2 of the input pairs(00 and 11), which is near one when ''p'' is near zero or one, and is minimized at 1/4 when ''p'' = 1/2 for the original process (in which case the output stream is 1/4 the length of the input stream on average). Von Neumann (classical) main operation pseudocode: if (Bit1 ≠ Bit2)


Iterated von Neumann extractor

This decrease in efficiency, or waste of randomness present in the input stream, can be mitigated by iterating the algorithm over the input data. This way the output can be made to be "arbitrarily close to the entropy bound". The iterated version of the von Neumann algorithm, also known as Advanced Multi-Level Strategy (AMLS), was introduced by Yuval Peres in 1992. It works recursively, recycling "wasted randomness" from two sources: the sequence of discard/non-discard, and the values of discarded pairs (0 for 00, and 1 for 11). Intuitively, it relies on the fact that, given the sequence already generated, both of those sources are still exchangeable sequences of bits, and thus eligible for another round of extraction. While such generation of additional sequences can be iterated infinitely to extract all available entropy, an infinite amount of computational resources is required, therefore the number of iterations is typically fixed to a low value – this value either fixed in advance, or calculated at runtime. More concretely, on an input sequence, the algorithm consumes the input bits in pairs, generating output together with two new sequences: (If the length of the input is odd, the last bit is completely discarded.) Then the algorithm is applied recursively to each of the two new sequences, until the input is empty. Example: The input stream from above, 10011011, is processed this way: From the step of 1 on, the input becomes the new sequence1 of the last step to move on in this process. The output is therefore (101)(1)(0)()()() (=10110), so that from the eight bits of input five bits of output were generated, as opposed to three bits through the basic algorithm above. The constant output of exactly 2 bits per round (compared with a variable 0 to 1 bits in classical VN) also allows for constant-time implementations which are resistant to timing attacks. Von Neumann–Peres (iterated) main operation pseudocode: if (Bit1 ≠ Bit2) else Another tweak was presented in 2016, based on the observation that the Sequence2 channel doesn't provide much throughput, and a hardware implementation with a finite number of levels can benefit from discarding it earlier in exchange for processing more levels of Sequence1.


References


Further reading

* Carl W. Helstrom, ''Probability and Stochastic Processes for Engineers'', (1984) Macmillan Publishing Company, New York .


External links


Using a binary tree diagram for describing a Bernoulli process
{{Stochastic processes Stochastic processes